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Abstract — In the problem of location anonymity of tlie events 
exposed to a global eavesdropper, we highlight and analyze some 
aspects that are missing in the prior work, which is especially 
relevant for the quality of secure sensing in delay-intolerant appli- 
cations monitoring rare and spatially sparse events, and deployed 
as large wireless sensor networks with single data collector. We 
propose an efficient scheme for generating fake network traffic to 
disguise the real event notification. The efficiency of the scheme 
that provides statistical source location anonymity is achieved 
by partitioning network nodes randomly into several dummy 
source groups. Members of the same group collectively emulate 
both temporal and spatial distribution of the event. Under such 
dummy-traffic framework of the source anonymity protection, 
we aim to better model the global eavesdropper, especially her 
way of using statistical tests to detect the real event, and to 
present the quality of the location protection as relative to the 
adversary's strength. In addition, our approach aims to reduce 
the per-event work spent to generate the fake traffic while, most 
importantly, providing a guaranteed latency in reporting the 
event. The latency is controlled by decoupling the routing from 
the fake-traffic schedule. A good dummy source group design also 
provides a robust protection of event bursts. This is achieved at 
the expense of the significant overhead as the number of dummy 
source groups must be increased to the reciprocal value of the 
false alarm parameter used in the statistical test. We believe 
that the proposed source anonymity protection strategy, and the 
evaluation framework, are well justified by the abundance of 
the applications that monitor a rare event with known temporal 
statistics, and uniform spatial distribution. 

I. Introduction 
Privacy issues are an important aspect of monitoring appli- 
cations in wireless sensor networks (WSNs). A recent survey 
of state-of-the-art research on privacy protection in WSNs ||6l, 
among other problems, reviews strategies to protect the object 
observed by a WSN node, referred to as source, from the 
global eavesdropper (Eve) ||5|, which can infer the location of 
the object based on the established location of the source. 

A. Problem Description 

The observed object may be a smuggler crossing the border, 

an important person entering a classified area, or endangered 
animals monitored in their habitats. Messages from the source 
are propagated in a traditional hop-by-hop manner, and di- 
rected to a fixed data collector, referred to as a base station, 
or a sink. Sink protection is not an issue as the adversary 
usually knows the sink, in fact she may even know the whole 
topology of the sensor network. In addition. Eve detects the 
timing and location of all transmissions in the network (hence 
global); she can hear and capture any packet sent in the 
network (either with a very powerful antenna or she has her 
own sensor network deployed in the area). Eve is powerful: she 
can employ complex statistical algorithms for detection, and 
arbitrary localization techniques. However, the message itself 
is encrypted and Eve does not know the encryption, hence 
she cannot capture the message and infer the object's position 



from the content. In addition. Eve, despite being so powerful 
and omnipresent, needs to stay invisible. We define outage as 
the event when, following an eavesdropped transmission. Eve 
reveals itself by taking actions based on the false suspicion that 
an event occurred. The actions may involve physical presence 
of the attacker or her faculties, in order to capture or destroy 
the object. Hence, a false-alarm presents a risk of personal 
exposure and liability. 

The adversary gathers the source of the transmission based 
on the change in the traffic pattern; a conspicuous case would 
be when a node starts transmitting after a prolonged period of 
inactivity in the WSN. For many event-reporting applications, 
despite the fact that the attacker cannot learn the details from 
the message content, inferring the contextual information, i.e. 
whether, when and where a concerned event has happened, 
may be enough to jeopardize monitored resources. Intuitively, 
the persistent dummy (fake) traffic is the only way to obfuscate 
the events, and the formal proof for it is available in ||5l. 
Dummy packets follow a predefined schedule, aligned with the 
expected timeline of real packets, so that Eve cannot observe 
the change. To better explain the intricacies of this approach, 
especially in light of the existing research, we next introduce 
two models of monitored phenomena. Let us first define the 
application delay as the delay in event reporting. In both 
scenarios, the duration of time is relative to the application 
delay constraint, which is a known value. We assume that the 
WSN is divided into cells, such that each sensor node monitors 
a unique cell and that the events are occurring in a uniform 
manner over time and space. An active event is any event that 
is not reported to the sink yet. 

1) Scenario A: Frequent and Dense Events: This model 
describes monitoring of a physical phenomena that creates 
on average one event per cell over one cycle, a relatively 
short period of time, whose duration is larger but of the same 
order of magnitude as the application latency constraint. In 
other words, in any cycle, there are many active events in the 
network. 

2) Scenario B: Rare and Spatially Sparse Events: In this 
model, the events are rare and isolated. For example, if the 
allowed application delay is in minutes, the expected interval 
between events is measured in hours or days. They are spa- 
tially sparse: we assume that there is at most one active event 
at a time. In fact, the examples of monitoring applications 
at the beginning of II-AI all represent the scenario B. No 
single node can statistically emulate the spatial and temporal 
characteristics of the events in this model. In addition, by 
observing a single node for the duration on the order of 
the application delay Eve can not reliably deduce deviation 
from the expected behavior. Consequently, Eve attempts to 



observe abnormalities in the network-wide traffic pattern. The 
anonymity protection scheme described here implements the 
traffic pattern in a decentralized manner, so that the occurrence 
of real events does not cause observable abnormalities. 

B. Solution Outline 

The uniform spatial/time distribution of events guides nat- 
urally the baseline scenario for the dummy traffic: all cells 
in the network send dummy messages at a constant rate 
regardless of whether a real event has occurred or not. That 
means that an event would have to wait to be reported on 
average for half of the inter-transmission interval. However, 
since the traffic in the network always keeps the same pattern, 
it effectively defeats any traffic analysis techniques. The main 
problem with dummy traffic is immediately obvious from the 
basic scenario; limiting the reporting delay calls for a high- 
rate fake traffic, which is not only expensive but may quickly 
burn out the network. 

Our approach stipulates the importance of knowing the 
event's temporal dynamics, in terms of the described scenarios; 
it allows us to design energy-efficient protection strategies. It 
is natural to assume that the expected frequency of events 
will be known to everyone (both the attacker and the network 
architect), given that we design the monitoring application 
for a particular physical phenomena, and that the easiest 
characterization of a random process is through its first mo- 
ment, or the moment's estimate. Next, among the all-positive 
probability distributions with a given expected value, expo- 
nential distribution has the highest entropy. Hence, assuming 
that inter-event times follow an exponential distribution of 
the estimated mean leads to a good and justifiable model of 
the event's randomness. In terms of traffic overhead/ energy- 
consumption and interference, the optimal design would force 
each node to transmit as rarely as possible, and that would be 
in exponentially distributed intervals of the expected duration 
exactly equal to the expected time between real events; smaller 
intervals generate more traffic and therefore cost more, while 
larger ones create too few opportunities for embedding the 
real-traffic, especially under delay constraints. 

The second major underpinning of our approach is the 
network-centric view of the problem. The following aspects 
of the problem are looked at from both the event's and the 
network's perspective: 

« Event is characterized as a spatio-temporal process over 
the whole network area, 

• Event-reporting delay includes routing latency, 

• Fake-traffic shaping is a decentralized process, collabo- 
ratively maintained by all nodes 

• Energy consumption per-event of the protection strategy 
is equally split among network nodes, and substantially 
decreased with respect to the baseline strategy due to 
nodes' collaboration. 

Finally, our source-anonymity protection scheme aims to 
achieve statistical event unobservability. The absolute pro- 
tection under baseline strategy is not applicable to delay- 
sensitive applications, such is the majority of event monitoring 



in WSNs. Secondly, we do not adjust the timing of real events 
as in ITI, to make the event pass the statistical test under the 
test parameters assumed to be used by Eve. Instead, we make 
the event pass the test with the same probability as the dummy 
transmissions, making it statistically indistinguishable. 

C. Existing Research 

There are a couple of papers that study the WSN source 
anonymity and, in our opinion, provide relatively efficient 
solutions only under scenario A fSl, Q, IS). At the ex- 
pense of huge traffic overhead, a practical tradeoff between 
security and latency is proposed in ||7|. The paper proposes 
to decrease the delay in event notification by having every 
WSN node maintain a random schedule sampled from the 
same (exponential) distribution of inter-transmission intervals. 
The improved latency in event notifications with respect to 
the baseline approach is achieved without making the real 
traffic observable by Eve. This approach assumes that the 
attacker knows the defense strategy of the WSN, and that 
he will use a state-of-the-art statistical test to distinguish the 
real event from the fake once. The rationale is that fake 
traffic, even if random, is designed to follow a distribution, 
while the real events may not. According to Q, the event 
sources should run the same test, and adjust the time of the 
real event to pass the test. Here, the event sources test only 
the intervals between own transmissions, as each node strives 
to maintain exponentiality of its own intervals. The test is 
assumed to be corrected for the finite sizes IS) by both Eve 
and the sources. The paper compares two varieties of this 
approach: one in which the real event embeds itself by waiting 
until next scheduled transmission (ProbRate), and another one 
(FitProbRate), when it waits as little as needed to pass the 
goodness-of-fit test for the exponential distribution, inferred 
from the previous transmissions. The latter approach results 
in smaller delay, but requires the correction of the schedule: 
the next scheduled fake event should be canceled, and the rest 
of the schedule is also verified so that the test failure at the 
next (dummy) transmission does not point out to the preceding 
disturbance. For monitoring WSN applications described in 
scenario A, where the spatio-temporal frequency of the events 
justifies frequent transmissions from every node, and hence, 
nodes create their own fake traffic, correcting the schedule 
can be implemented as an extension of the dummy schedule 
protocol. Let us note here that ||7l does not explicitly state 
that the exponential distribution of fake events is designed to 
emulate the expected frequency of actual events, although it 
can be inferred from the values used in the simulation. The 
paper does not assume any particular event distribution, and, 
consequently, does not evaluate the performance with respect 
to overhead per event. 

We further observe that the scheme proposed in fJ\ defines 
delay as the time between the event occurrence and the 
source's transmission, which holds only for WSN applications 
in which the sink is one hop away from any source. If 
the packet is delivered to the sink in a hop-by-hop manner, 
the latency includes another random part due to summation 
of the exponentially distributed delays associated with such 



transmission schedule of each relay. We refer to this additional 
delay as the publishing mute (PR) latency. When the expected 
value A of the inter-transmission times is the same at each 
node, as in 121, and designed to imitate a relatively rare event 
pattern, for the source-sink route of h hops, the PR latency 
becomes an Erlang-distributed random variable with mean hX. 
From this point of view too. Scenario B requires a modified 
approach to source anonymity. 

To decrease the overhead of the dummy protection scheme 
in jT], in m the same authors propose a WSN with several 
proxy nodes, which pick up transmissions from surrounding 
nodes, and filter out the dummy packets. Apart from requiring 
mitigating solutions, frequent dummy traffic inevitably leads 
to interference, which additionally increases the PR latency. 

In summary, an important missing point in the existing 
research is the analysis of the overhead per event, especially 
in WSNs with rare and spatially sparse events. Under such a 
scenario (B), let us scrutinize the approach in |(7 |, where each 
node generates its own fake traffic. As explained, the overhead- 
optimal design would require each node to transmit with an 
average rate equal to the expected frequency of real events. 
For the cases simulated in Ul , the mean of dummy message 
intervals is 20s, and real events arrive according to a Poisson 
process with the rate changing from 1/20 to 1/100. Their 
protection scheme achieves the average latency of less than 
Is. If we replace seconds with hours, having in mind events 
that happen once a day, or once a couple of days, the delay of 
one hour does not seem to be acceptable. Additionally, under 
the overhead-optimal design, the PR latency for rare events is 
prohibitive, even for applications that are not delay-sensitive. 

The next section briefly introduces our solution to this 
problem. Section Hill describes and analyzes the decentralized 
algorithm that implements the proposed solution. Section |IV] 
presents the simulations for some of the most realistic and 
important event dynamics, and demonstrates the superiority 
of our approach in protecting the anonymity of such a source. 
In section [V] we discuss statistical tests considered in our 
approach. Finally, in IVII we conclude. 

II. Our Approach 

A. System Model 

We have a static WSN of n nodes. There is one static sink 
collecting event notifications from all nodes. The monitoring 
application is delay sensitive: the time between the event 
occurrence and the sink's notification must be smaller than 
A. We assume that monitored events have Poisson temporal 
distribution of a known rate A = l//i, and uniform spatial 
distribution over the area of network deployment. Hence, 
the time between the events is distributed according to a 
exponential distribution (of expected value jj, >> A). The 
source is assumed to transmit a burst of packets, all describing 
the event. For simplicity, we first analyze the burst of unit 
length (one packet). In addition, we separately analyze the 
existence of outliers in the event's distribution. 



B. Problem Formulation 

We already established that the only way to confuse Eve 
is through persistent network-wide transmissions. Simultane- 
ously, as hop-by-hop is prevalent data transfer model in WSNs, 
and distance to a sink may be considerable, to satisfy the 
application latency constraints, we need to decouple routing 
from the fake traffic schedule by allowing immediate relaying 
of event notifications as opposed to piggybacking them on the 
existing fake transmissions. However, as such a route may 
be backtraced to the source, a similar routing path should 
be emulated from each fake source (see Figure [T]i. Given the 
mentioned constraints, we set the goal to achieve statistically 
strong source anonymity through methods that optimize energy 
and delay |5j|. We hereby propose a pattern of fake traffic 
that scales well with network size, and satisfies application- 
latency constraints, while protecting the source location up to 
a given significance level, defined under the strong statistical 
tests available to Eve lH, lfT2l . To confuse the attacker, we 
propose to replicate the spatio-temporal process through the 
following mechanisms: 

• Source Emulation: a subset of d nodes regularly wakes 
up to act as dummy sources. As a result, any real 
event is covered by d dummy sources, which we refer 
to as dummy population. To explain what it means for 
an event to be covered, we introduce a time interval, 
dubbed round, whose duration is equal to the expected 
inter-event time. Hence, the length of a round is ji. 
Covering a source implies the expected existence of 
fake transmissions in the same round in which the event 
occurs. To engage all nodes equally, we may divide the 
network in d groups and assign one representative of the 
group to a distinct round. Each group will maintain a 
schedule that emulates the event distribution Q^. Due 
to the size of the dummy population, the probability 
distribution of the intervals between any two consecutive 
dummy events is C,^/^ (exponential of expected value 
ji/d). Such cooperative shaping of the fake traffic in 
order to emulate a sufficiently dense Poisson distribu- 
tion is amenable to distributed implementation, which is 
thoroughly explained in Section It also allows us to 
uniformly distribute fake activity across the network area. 
However, as the attacker overhears every transmission 
that occurs in the network, and may integrate all recorded 
temporal data into one global network activity timeline, 
it is judicious to ignore for a moment the decentralized 
implementation. Instead, we look at the global timeline as 
if it was produced by a single source sampling the values 
of event inter-arrivals from the distribution C/j/d- The joint 
empirical distribution of inter-transmission times, created 
by extending the fake schedule with the immediate (un- 
delayed) transmissions of real events, is, based on the 
transmissions' independence, CAi/(rf+i)j i-^- exponential 
with the expected value /-t/((i + 1). For sufficiently 
large c?, C/^/((j+i) does not diverge perceptibly from the 
distribution on the global timeline of fake events Q^/d)- 



• Route Emulation: each source (dummy or real) forwards 
the packet along a predetermined route towards the sink 
(see Figure[T]i. The inter-transmission time between relays 
is constant and significantly shorter than A (and, conse- 
quently, orders of magnitude smaller than /i, as opposed 
to Q where it is tied to the inter-transmission time of 
dummy sources). When the real source starts transmitting 
without delay, the application latency is equal to the 
routing delay, which is now decoupled from the time 
dynamics of the fake traffic, and can be further optimized 
by minimizing the number of hops. This can be achieved 
by implementing shortest-path routing, or by increasing 
the transmission range of the relays through the usage of 
advanced channel codes, while keeping the transmission 
power constant. 

• Knowing the Attacker's Detection Methods: Eve is 
assumed to be able to estimate the distribution of recorded 
transmission times. The estimated distribution is used as 
a reference point in the real-event detection strategy that 
involves a statistical test. The basic idea of those tests 
is to evaluate the distance between the distribution of 
the sample data and a specified probability distribution, 
which is the one that has been estimated based on a 
sufficiently large data set. Alternatively, the test evaluates 
if the current sample comes from the same distribution 
as the previously evaluated ones. If the distance is statis- 
tically significant, where the significance level is derived 
from a parameter of the test, which also defines the 
percentage FA of false alarms, it is decided that data 
do not follow this distribution. In Section |V] we describe 
in more detail the testing strategies that Eve may opt for. 

As the latency issue is decoupled from the fake traffic design, 
we seek to determine the minimal dummy population size 
needed to secure a given statistical anonymity, hence mini- 
mizing the overhead. For a WSN of size n, we define Wn 
as per-event and per-round energy consumption of the source 
anonymity mechanism. We express Wn in terms of the number 
of packet-forwarding hops, where we upper-bound the length 



of the publishing route hy h = O (^y i^^t^J i assuming the 
transmission range necessary for a connected network of size 
n ITOl . Hence, the cost of each fake source transmission will 
be of the same order. With d fake sources covering each real 
event, = O (^{d + ^)'^ \^^:^ i which demonstrates the 
importance of optimizing d, as the source anonymity calls for 
a sufficiently large d. 

III. Decentralized Generation of Fake Traffic 

A. Baseline Decentralized Algorithm 

In Section we highlighted the importance of coopera- 
tive and distributed shaping of the fake traffic that should 
result in sufficiently dense Poisson distribution of dummy 
transmissions. Let us now propose the realization of such 
a decentralized system. We first establish the baseline for 
the decentralized implementation. As the dummy population 
covering each event needs to include on average d fake 
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Fig. 1. Dummy population of size d accompanies an event; nodes has 
been assigned the round index j at the initialization, and index assignment 
sometimes includes extra effort to create spatial clusters of nodes within 
dummy population, to further confuse Eve (look at the nodes delimited by a 
dashed-blue ellipse). 



sources, we define en epoch of duration T = /i^ in which each 
network node will get to be a dummy once. Then, we let each 
node draw a time instant to transmit in this epoch by sampling 
uniform distribution U{0,T). The causality of transmissions 
will arrange all node samples in increasing order, resulting in 
exponential distribution (^^/j; of inter-transmission times. The 
procedure can be extended to the consecutive epochs, so that 
in the ith epoch nodes draw their transmission times from 
U{{i — 1)T, iT). Note that the particular uniform distribution 
range does not overlap with the ranges of distributions per- 
taining to other epochs. The collective empirical distribution 
of transmission times is the distribution of almost independent 
disjoint events, and therefore it approximates the Poisso- 
nian distribution. The independence of transmission events 
is broken only on the boundary of the epochs, as with each 
new epoch the nodes sample from a uniform distribution of 
different disjoint range. Hence, the distribution of the interval 
Z between the first event in the new epoch and the last event 
in the previous epoch is not exponential. 

Z = T+U -V 
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and o.w. 0, which is for sufficiently large n clearly exponential 
distribution of expected value 4 
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for T > u > 0. For sufficiently large n 

d 

fv{v) = -e d . 

For ly = T — y, /vy(iy) = t , where now 
Z = [/ + ly. 
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The random variable Z has a range [0, 2T] , and, from (|8]l, its 
probability distribution is Erlang, with the shape parameter of 
2, denoted e(2,-^)- 
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As the probabiUty of an inter-epoch sample in the collection 
of test samples of size t, where i < n, is j or 0, the average 
test-failure probability will be at most jFAe + ^^FA, where 
FAe denotes failure probability when Erlang-distributed sam- 
ples are tested on exponentiality. Even though the distribution 
of Z differs from C/^/d, we observe that the baseline decentral- 
ized implementation will pass the exponentiality test for any 
reasonable size of t. Such a distribution satisfies our needs for 
real-event obfuscation. However, for other reasons (explained 
later in this section), related to cases when the outliers of 
the real-event temporal distribution coincide with the spatial 
correlation of events, we propose the following realization of 
the dummy traffic, dubbed group implementation. 

B. Group Algorithm 

In the initialization, the WSN nodes are divided into d 
groups of size [n/dj, and every node in the group is as- 
signed an index i, denoting the round in which the node 
will cover the source. Hence, the fcth round, where k = 
i mod [n/d\ , i G { 1 , [n/dj } , will have a dummy population 
of d nodes belonging to different groups. A group schedule 
is created by letting the ith member select the specific time 
instant to transmit a fake message, sampled from the uniform 
distribution U{{i ~ ifj). Such an algorithm is amenable to 
distributed implementation, since each node can independently 



measure time and keep count of the current round. Once the 
round index corresponds to node's index modulo group size, 
the node draws a sample from the pertaining distribution, 
and determines its transmission time. As the independence 
of transmission events is broken only on the boundary of the 
rounds, the distribution of the interval Z between the first event 
in the new round and the last event in the previous round is 
not exponential. Once again, Z = U + W, where W ^ i-i — V, 
V = max {Xi, • • ■ , Xd} , and U = min {Xi, • • • , X^} . 



Fu(,u) = l~il~Fxiu)y 



and 



„ / N d f u 
u{u) = - 1 - - 



d-1 



for /i > ii > 0, 



(10) 



(11) 



and o.w. 0, which is, for sufficiently large d, exponential 
distribution of expected value ^. Now, following a derivation 
similar to ([5]), we obtain 



fv{v) = 



(12) 



for /i > f > 0. For large d, the distribution of W is Cn/d- The 
random variable Z has a range [0, 2/i] , and the probability 
distribution is 



fz{z) = z-^e d =£(2,^), 



(13) 



for sufficiently large d. Hence, with group implementation Z 
follows the same Erlang distribution e(2,ii) as in the baseline 
decentralized implementation. However, to be able to state that 
the test sample of size t < n, which includes inter-round inter- 
vals with probability 1 /d, will be statistically indistinguishable 
from the sample of exponentially distributed intervals, we 
need to impose a stricter requirement for the value of d. 
The average test-failure probability will be j^Ae + ^^FA. 
Upper-bounding the failure probability FAe (when Erlang 
samples are tested on exponentiality) with one, we obtain 
that d should be at least If the spatially-uniform events 
do follow the distribution in time, this may unnecessarily 
increase the per-event overhead with respect to the baseline. 

1) Spatial Correlation: However, if we have a more com- 
plex event distribution, by selecting the group algorithm we 
are able to not only render the event's temporal characteristics 
indistinguishable, but also to obfuscate spatial correlation. For 
example, a time-burst of events (temporally correlated observa- 
tions), illustrated in Figure|2](C), may coincide with the spatial 
correlation of the same events. The time burst can be handled 
in both decentralized algorithms by a careful design of d, as 
we will illustrate in Section HV] The group implementation can 
be leveraged to disguise the spatial patterns by regularly (in 
each round) assigning a sufficiently large part of the dummy 
population to random adjacent nodes (such as nodes within the 
dashed-blue ellipse in Figure [T]!, while the rest of the dummy 
population will be uniformly scattered across the network. If 
the average spatial burst of real events is of size b < d, then in 
the ith round one of d groups may be the principal one, whose 
representative will therefore select the closest b unscheduled 
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Fig. 2. (A) Illustration of our group algorithm for fake-traffic generation, 
where we randomly form d groups of nodes to cooperatively create the 
schedule by samphng a series of uniform distributions. (B) Large enough d 
renders small divergence from the Poisson distribution statistically irrelevant. 
Bottom axis: for sufficiently large d, mixing several "group" schedules and 
the timehne of the real events (red stars) into a global transmission schedule 
observed by Eve, statistically indistinguishable from Poisson schedule. (C) 
Exponential distribution of events disturbed by a burst of events with small 
interanival times, called pauses. 

nodes to be representatives of additional b groups, and the rest 
of the dummy population is selected randomly. In the rest of 
the paper we will focus on the group implementation. 

IV. Simulations 

When designing the simulations, we dismissed the possibil- 
ity that Eve would test the schedule of any single node, since, 
in our scenario, inter-transmission times per-node are large 
with respect to fi (inter-event times), and, hence, it takes a lot 
of time to record a reasonable test sample. Our primary goal 
was to demonstrate the influence of the dummy population size 
d to the statistical properties of the network-wide transmission 
schedule, both in the absence of real events, and under 
different stochastic models for real events. For sufficiently 
large d, which is still much smaller than n, our simulations 
show that the insertion of events does not statistically change 
the time axis. Therefore, by running the statistical tests. Eve 
does not obtain any additional information that would help her 
capture the monitored object, even if the time of transmission 
of a real source is not delayed. With the existing work |[7), 
an adjustment delay is added, and another mechanism may be 
needed to fix the sample mean affected by the adjustments, to 
delude Eve's sequential analysis tests, such as SPRT lll2l . 

Performing the Anderson-Darling (A-D) test ID, which is 
a powerful test for exponentiality ||3|, ||9l, on the samples 
drawn from an exponential distribution results in a percentage 
of failures, which represent false alarms. The percentage of 
false alarms is a random variable whose mean corresponds 
to the false-alarm parameter of the test (also referred to as 
the significance level), denoted by a. Due to randomness, 
over certain sets of test samples this percentage will fluctuate 
around the value of the parameter provided by the test. Our 
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Fig. 3. The effect of the sample size on the False Alarm (FA) trend (FA as 
a function of the round number, each round containing d samples): (A) For 
samples drawn from an exponential distribution and the sample size di at round i FA's 
mean stays constant. (B) For uniformly sampled transmission times, the sample size 
equal to d includes samples mostly from a single round, hence, exponential samples, 
and fa's mean is constant as in (A) (C) For uniform sampling and the sample size di 
at round i for d — 100 FA achieves close approximation with exponential distribution 
across the rounds, as opposed to d — 10 that does so only within one round (D) At 
each new round we test 200 preceding samples 

testing strategy monitors the rate of test failures to evaluate if 
it behaves as expected for the exponential distribution. Certain 
outliers do not perceptibly change the rate of failures, but 
more frequent or consistent divergence from the exponential 
distribution of samples will have a visible effect on the 
monitored rate. 

If d is not sufficiently large, inter-transmission times in 
group implementation would not follow exponential distribu- 
tion, and goodness-of-fit tests of exponentiality |4|| will fail in 
a much larger percentage of cases than the value of the test's 
FA parameter. For small d, the test results are also sensitive to 
the sample size: smaller samples that span over just one round 
will most likely pass the test since ordered uniform samples 
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Fig. 4. For a = 0.01, false alarm (FA) vs the event number, when 
events expected to be from exponential distribution C,^, ai'e inserted into 
the fake traffic where the graph with circles represents d=10, and the graph 
with diamonds d=100 (a) event distribution as expected, (b) expected 
event distribution disturbed by randomly positioned samples of exponential 
distribution Cji/iooo making 20% of all samples (c) expected distribution 
disturbed by randomly positioned samples of exponential distriburion Cji/ioo 
making 20% of all sample (d) expected distribution disturbed by randomly 
positioned samples of exponential distriburion Cp/io making 20% of all 
sample 

in any range [{i — produce intervals described by the 

exponential distribution C^i/d- ^^^^^ illustrated in Figure[3](b). 
Large number of samples would include many rounds, with 
1/d portion of samples not belonging to the exponential 
distribution. Figure [3] compares the pure exponential samples 
whose false alarm statistics are independent of the sample size 
(pane (a)), with the samples obtained using our decentralized 
algorithm for d = 10 and d = 100, when the tested sample 
size was d in each round (b), di in the ith round (c), or 200 
constantly (d). The FA statistics are plotted as a function of the 
round number i, although the sample size was changing with 
the round index as di only in subfigure (c). In this case, we 
see that for sufficiently large d of 100 our algorithm based 
on uniform sampling achieves close approximation of the 
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Fig. 5. For a = 0.01, false alarm (FA) vs the event number, where the graph 
with circles represents d=10, and the graph with diamonds d=100, while (a) 
expected event distribution disturbed by a burst of size 10, and pause 
/i/1000 (b) expected distribution disturbed by a burst of size 10, and pause 
/u/100 (c) expected distribution disturbed by a burst of size 10, and pause 
li/W 

exponential distribution as it exhibits the same FA percentage 
across the rounds, as opposed to d = 10 which stays expo- 
nential only within one round. In subfigure (d) at each new 
round we test 200 preceding samples, which for d = 10 can 
be only collected after 20 rounds (notice the slope in the upper 
graph for the first 20 rounds). The same sample size of 200 
was used to test the influence the inserted real events had on 
the percentage of test failures, illustrated in Figures |5]and |4] 
As our decentralized algorithm for small dummy populations 
does not approximate exponential distribution sufficiently well, 
it is also more sensitive to the disturbances due to insertion 
of real events. When d is small (10), interleaving the real 
event schedule into the fake events timeline even improves 
the results of the A-D test. Although this phenomenon can 
be explained by the decreased contribution of Erlang samples 
(relative to exponential), we omit the detailed explanation due 
to space limitations. 

As explained in the previous section, for larger d, the 
exponential distribution C^^yd is statistically indistinguishable 
from the distribution of intervals produced by the group 
algorithm. Accordingly, the false alarm frequency obtained 
with decentralized sampling of ordered uniform distributions 
is exactly what we obtained testing pure exponential samples. 
The mean of the distribution is /z/d, which is also sufficiently 
small to allow imperceptible interleaving of the real event 



samples. 

We also test the proposed location-anonymity protection 
for a more general event distribution, i.e. when the empirical 
distribution is disturbed with respect to the estimated one, 
either by spurious outliers sampled from some C^/g' 9 > li 
or by the bursts of events whose triggers are separated by ex- 
ponentially distributed intervals according to C,fj_, as illustrated 
in Figure |2] (C). The latter case is of most interest to us, as 
it represents a common scenario in which a whole cluster of 
events, usually observed by several sensors, is related to a 
single object. Temporally, the burst is a set of events whose 
inter-event times, referred to as pauses, are much smaller than 
one round, and whose size is a considerable fraction of the test 
sample size. Such clusters may occur as a result of inherent 
physical dynamics of the observed process. We assume that 
the spatial correlation among events in the burst does not exist, 
but we offer a solution for the opposite case as well (Figure[T]i. 
We now provide more detail about the simulations presented in 
Figures |5]and |4] We start with a pre-designed schedule where 
a timeline for fake events is determined by drawing ad samples 
from exponential distribution C^/d: where = 80 is the time- 
window we observe, measured in rounds. We perform the 
following experiments with such a fake-event schedule, where 
the plots in Figures |5] and |4] illustrate the average behavior 
over 250 runs, based on different, independently drawn fake 
and real timelines: 

• Adding real events that follow the estimated distribution 
(fj,: As we insert events into the fake schedule, we perform 
the A-D test on the sample which includes the event itself 
and 200 previous events (if accumulated by the time of 
the event occurrence). 

• Adding real events that follow the estimated timeline 

perturbed in 20% of the cases, picked at random, 
by samples taken from an exponential probability of 
smaller expected value: We perform the same A-D test 
as above for each real event, where the expected value 
of the disturbing distribution is /i/1000, /i/100, /i/10 (in 
Fig.|5](b), (c), (d), respectively), for d ^ 10, and d ^ 100, 

• Adding real events that follow the estimated timeline , 
with an exception of one burst of close samples (having 
small intra-burst intervals, or pauses): We perform the 
same A-D test as above for each real event, where the 
values for the pause duration are /i/1000, /i/100, /i/10 
(in Fig. |4] (a), (b), (c)). Over all runs of the experiment, 
the burst position is kept constant (in the same round), to 
highlight the impact on any single run. 

All these experiments are repeated when a timeline for fake 
events is generated by drawing ad samples using the base- 
line decentralized algorithm, and finally when the fake event 
schedule is created by the group decentralized method. It is 
important to highlight that obtained false alarm statistics, in the 
case of sufficiently large d, were the same for all 3 mechanisms 
of fake traffic generation. Hence, our approximation performs 
well for properly designed values of d. Figure |5] illustrate the 
importance of the proper design, as a pause duration which 



is smaller than network-level inter-transmission time ^ causes 
considerable increase of the FA rate if the burst is long enough. 

V. Discussion of Statistical Tests 

A. Tests of Exponentiality 

We base our analysis on the assumption of a single statistical 
test, namely Anderson-Darling test for exponentiality. Apart 
from the motives of simplicity, and the existence of a reference 
that employs the same test iTj, we here provide additional 
arguments for such an approach. The goodness-of-fit (GOF) 
of a statistical model describes how well it fits a set of 
observations. Measures of goodness of fit typically depict the 
discrepancy between observed values and the values expected 
under the hypothesized probability distribution T{x). The 
classical GOF test is Pearson's > which is good only for 
discrete distributions. EDF tests refer to Empirical Distribution 
Function, denoted J^n{x), and defined as the relative number 
of samples (out of n) that are smaller or equal to x. EDF 
tests define different statistics, based on a measure of the 
discrepancy between J-n{x) and the probability distribution 
in question. 

In llll five of the leading EDF statistics, including the 
statistic defined under A-D test, are examined in three 
important situations: when the hypothesized distribution J-{x) 
is completely specified, when J-{x) is normal, and finally, the 
case that concerns us, when J-{x) is exponential, with the ex- 
pected value to be estimated. The paper shows that, when used 
properly, EDF tests have much higher powers than previously 
reported, and those of A^ are comparable, and appear to be 
highly correlated with Shapiro- Wilk regression test statistics, 
that were previously considered superior in power, although 
difficult to compute. In fact, all the examined tests are shown 
to be competitive in terms of power. In particular, A^ seem to 
be especially powerful to detect divergence in the mean. The 
results published in 111 are very important as EDF statistics are 
easily calculated. The two facts represent the central argument 
for using Anderson-Darling test as a model for Eve's approach 
to statistical inference. 

However, for the A-D test of exponentiality when the 
distribution parameter is unknown, we ought to calculate 
an estimate of the expected value v. Maximum likelihood 
estimate of the parameter v is exactly the mean of the samples. 
While this estimate is the most likely reconstruction of the 
true parameter, it is only an estimate, and, hence, the more 
data points are included, the better the estimate will be. 
The confidence interval of how good the estimate is can 
be exactly calculated based on the number of samples IfTTII . 
Hence, when testing the most recently recorded transmission 
intervals. Eve will have to consider a large enough sample 
(observation window) in order to secure sufficient confidence 
in the hypothesized distribution F{x). Moreover, an account 
must be taken of the sample size in adjusting either the test- 
statistic or its critical values for desired a. If the value of 
the statistic is found to be larger than the critical value (also 
called the significance point), the test claims that the sample is 
not from the exponential distribution. The asymptotic critical 



values have been calculated theoretically in |[T]. For finite 
n, significance points are difficult to find theoretically and 
have been found through Monte-Carlo simulations by the same 
author. The A-D significance points for various values of a, 
and several sample sizes, are given in Table 2 of 111, and are 
utihzed in our implementation of the A-D test. 

B. Utilization of Tests in Our Protection Scheme 

We design fake traffic so that the test-failure statistics do 
not change if exponentially distributed real events are added 
to the schedule. Given the average interval between real events 
fi, this is achieved if jy = ^, and d is sufficiently large. For 
the reasons stated above, we assume that Eve will work with 
larger sample sizes. We tested with samples of length 200, 
event though for exponentially distributed real events without 
outliers ten times shorter samples work equally well. When 
interpreting the results of statistical tests in the context of 
our protection scheme, it is of extreme importance to model 
the testing strategy used by Eve. A certain percentage of test 
failures, on the order of false-alarm parameter, will occur for 
perfectly exponential test samples, and over time we will also 
observe some variance in the rate of those failures. We refer 
to this rate as the false-alarm rate. The same rate of failures 
quantifies the probability of Eve's false decision that the real 
event has happened. We design the fake traffic to achieve 
the statistical indistinguishability of the traffic pattern from 
a Poisson process (i.e. exponential distribution of intervals), 
which is achieved when the rate of failures produces by testing 
the global traffic has the same dynamics as the false-alarm 
rate. Only if an unexpected amount of outliers is present in 
the real event distribution, the failure rate will increase above 
the false-alarm level. Eve sets its FA parameter (a) based 
on its own measure of how important it is not to detect an 
event when there is none. If it is not important. Eve may set 
the FA parameter high in order to improve the probability of 
detection. In the opposite case, she will set the FA parameter 
low. In both cases our protection mechanism will successfully 
counteract, as the fake-traffic design did not depend on the a 
parameter used in the attacker's tests, but instead the empirical 
false-alarm statistics were used as a reference to detect the 
divergence from exponentiality. 

In addition, by relying upon the test results, which may be 
false. Eve risks to be discovered, hence creating an outage 
of the global attack. Eve may opt for a different strategy: 
if making a decision based on false alarm creates an outage 
of the attack. Eve will not use the immediate result of the 
test to make a decision. Instead, Eve will look at the trends 
of failure statistics, trying to detect consistent divergence that 
drives the percentage of failures above the relaxed significance 
level, based on the empirical rate of failures (see Figure |5]). 
This detection procedure is efficient in detecting outliers, such 
as a burst of events which deviates from the distribution 
accounted for in the event model or/and when the set of event 
transmissions are correlated in space. 



VI. Conclusion 

The proposed decentralized implementations of the fake 
traffic provide desired statistical source anonymity with the 
minimal overhead and a delay that depends only on the 
efficiency of packet routing. Simultaneously, they utilize the 
network resources in a balanced and fair way, and provide 
flexibility necessary to handle different temporal and spatial 
profiles of the event process. By designing only one parameter, 
the size of the dummy population d, according to the known 
statistical characteristic of the observed process we achieve 
such flexibility. In this paper, we discussed the minimal 
dummy population for several statistical models of the event. 
The minimal value of d depends on the implementation, as 
the deployment of group implementation requires d to be at 
least -pT^. Uniform spatial distribution of events does not call 
for the group implementation, and this constraint on d does 
not hold. However, d needs to be large enough to render (^^/j; 
and Cii/{d+i) statistically indistinguishable. Finally, if the event 
process include occasional bursts, the minimal value of d is 
defined by the ratio of the expected inter-burst time /i, and 
the expected intra-burst time (i.e. pause). Detailed analysis is 
omitted due to space constraints. 

Our future work is to formalize a metric for the goodness 
of a WSN source anonymity scheme that includes the Eve's 
outage probabiUty and her work needed to collect statistically 
relevant samples. By including the adversary's work and 
vulnerabilities, we aim to better model a global eavesdropper, 
and to present the quality of the source anonymity protection 
as relative to the adversary's strength. In addition, the goodness 
metric should include the statistically guaranteed anonymity 
level, the work spent to obfuscate the events, and the latency 
guarantees by the proposed algorithm. 
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